home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Software Vault: The Gold Collection
/
Software Vault - The Gold Collection (American Databankers) (1993).ISO
/
cdr31
/
rle8_sc.zip
/
RLE8.DOC
< prev
next >
Wrap
Text File
|
1993-05-14
|
3KB
|
94 lines
Run Length Encoding compressor program 8 bit header version
Written by Shaun Case 1991 in Borland C++ 2.0
with sizeof (int) == 2
This program and its source code are Public Domain.
This program should be portable to any machine with
2 byte short ints and 8 bit bytes, if you patch the
filename stuff, which is ms-dos specific.
What is run length encoding?
Run Length Encoding, also known as RLE, is a method of compressing data
that has a lot of "runs" of bytes (or bits) in it. A "run" is a series
of bytes that are all the same. For instance, the string "THIS IS A
VEEEEEEEEEEEEEEEEEEEEEEEERY INTERESTING SENTENCE" has a run of 23 'E's
in it. This could be compressed in the following manner:
THIS IS A V23ERY INTERESTING SENTENCE
resulting in a savings of 20 characters. A further savings of one
character can be realized if the sequence "23" is replaced by a single
byte with the value 23.
However, if the text to be encoded is arbitrary, then it may contain
numbers as well as letters, and bytes of all possible values. For this
reason, there must be some way to let the decoder know when a compressed
run is encountered, and when a sequence to be passed straight through is
encountered. For this reason, the following file format was used:
========= tech info =========
8 bit header version.
File format:
13 byte original filename, followed by
[ 8 bit header + data ][ 8 bit header + data ][ 8 bit header + data ]
etc..
header:
bit 7 : 1 if following byte is a run
bit 6 - 0 : legnth of run (max 127, min 3)
data: 1 byte : which character run consists of
*** OR ***
header:
bit 7 : 0 if following bytes are sequence
bit 6 - 0 : legnth of sequence (max 127)
data: (header AND 0x7F) bytes of data
: data bytes copied to output stream unchanged
===============================
bugs:
None known
Nasty features :
1) When encoder reaches max run length, it is written
out correctly, but is followed by a 1 length run of
the next byte. Odd. Reason unknown.
2) Better compression could be achieved by having min
compression length and sequence length understood
to be 2. This would allow an "understood" multiplication
of the seq_len or run_len by 2, since 1 is never used,
allowing sequences of 254 bytes. This is not likely
to give much better compression in most cases,
and is left as an exercise for the reader.
Implementing this requires fixing 1 above, too.
Author: atman%ecst.csuchico.edu@RELAY.CS.NET (internet)
1@9651 (WWIVnet)
atman of 1:119/666.0 (fidonet)
Tell me hi if you use this program!